open source model
Ontology Learning with LLMs: A Benchmark Study on Axiom Identification
Bakker, Roos M., Di Scala, Daan L., de Boer, Maaike H. T., Raaijmakers, Stephan A.
Ontologies are an important tool for structuring domain knowledge, but their development is a complex task that requires significant modelling and domain expertise. Ontology learning, aimed at automating this process, has seen advancements in the past decade with the improvement of Natural Language Processing techniques, and especially with the recent growth of Large Language Models (LLMs). This paper investigates the challenge of identifying axioms: fundamental ontology components that define logical relations between classes and properties. In this work, we introduce an Ontology Axiom Benchmark OntoAxiom, and systematically test LLMs on that benchmark for axiom identification, evaluating different prompting strategies, ontologies, and axiom types. The benchmark consists of nine medium-sized ontologies with together 17.118 triples, and 2.771 axioms. We focus on subclass, disjoint, subproperty, domain, and range axioms. To evaluate LLM performance, we compare twelve LLMs with three shot settings and two prompting strategies: a Direct approach where we query all axioms at once, versus an Axiom-by-Axiom (AbA) approach, where each prompt queries for one axiom only. Our findings show that the AbA prompting leads to higher F1 scores than the direct approach. However, performance varies across axioms, suggesting that certain axioms are more challenging to identify. The domain also influences performance: the FOAF ontology achieves a score of 0.642 for the subclass axiom, while the music ontology reaches only 0.218. Larger LLMs outperform smaller ones, but smaller models may still be viable for resource-constrained settings. Although performance overall is not high enough to fully automate axiom identification, LLMs can provide valuable candidate axioms to support ontology engineers with the development and refinement of ontologies.
- Europe > Netherlands > North Holland > Amsterdam (0.04)
- Oceania > Australia > New South Wales > Sydney (0.04)
- Europe > United Kingdom > England (0.04)
- (5 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
ByteDance and DeepSeek Are Placing Very Different AI Bets
The diverging path of China's two leading AI players shows where the country's artificial intelligence industry is headed. DeepSeek and ByteDance, the two leaders of China's AI industry, are adopting vastly different strategies. On Monday, DeepSeek released DeepSeek V3.2, another open-weight model that anyone can tinker with. The startup says it performs on par with the latest models from OpenAI and Google, and it even beats them on some key mathematics benchmarks. That same day, ByteDance, whose dominance in AI applications we covered previously, introduced ways for people to use its chatbot, Doubao.
- Asia > China (0.81)
- North America > United States > California (0.14)
- Asia > Nepal (0.14)
- (3 more...)
- Information Technology > Services (0.98)
- Government > Regional Government > Asia Government (0.34)
OpenAI's Open-Weight Models Are Coming to the US Military
OpenAI's Open-Weight Models Are Coming to the US Military The gpt-oss models are being tested for use on sensitive military computers. But some defense insiders say that OpenAI is still behind the competition. When OpenAI unveiled its first open-weight models in years this August, it wasn't just tech companies that were paying attention. The release also excited US military and defense contractors, which saw a chance to use them for highly secure operations. Initial results show that OpenAI's tools lag behind competitors in desired capabilities, some military vendors tell WIRED.
- North America > United States > Illinois > Cook County > Chicago (0.05)
- North America > United States > Virginia > Arlington County > Arlington (0.04)
- North America > United States > California > Los Angeles County > Los Angeles (0.04)
- (4 more...)
- Government > Regional Government > North America Government > United States Government (1.00)
- Government > Military (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (1.00)
'Sovereign AI' Has Become a New Front in the US-China Tech War
'Sovereign AI' Has Become a New Front in the US-China Tech War OpenAI has announced "AI sovereignty partnerships with governments around the world, but can proprietary models compete with Beijing's open source offerings? OpenAI has announced a number of projects this year with foreign governments to help build out what it has called their "sovereign AI" systems. The company says the deals, some of which are being coordinated with the US government, are part of a broader push to give national leaders more control over a technology that could reshape their economies. Over the past few months, sovereign AI has become something of a buzzword in both Washington and Silicon Valley. Proponents of the concept argue it's crucial that AI systems developed in democratic nations are able to proliferate globally, particularly as China races to deploy its own AI technology abroad.
- Asia > China > Beijing > Beijing (0.25)
- North America > United States > California > Alameda County > Berkeley (0.05)
- Europe > Slovakia (0.05)
- (3 more...)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.87)
This Startup Wants to Spark a US DeepSeek Moment
With the US falling behind on open source models, one startup has a bold idea for democratizing AI: let anyone run reinforcement learning. Ever since DeepSeek burst onto the scene in January, momentum has grown around open source Chinese artificial intelligence models. Some researchers are pushing for an even more open approach to building AI that allows model-making to be distributed across the globe. Prime Intellect, a startup specializing in decentralized AI, is currently training a frontier large language model, called INTELLECT-3, using a new kind of distributed reinforcement learning for fine-tuning. The model will demonstrate a new way to build competitive open AI models using a range of hardware in different locations in a way that does not rely on big tech companies, says Vincent Weisser, the company's CEO.
- North America > United States > California > San Francisco County > San Francisco (0.05)
- Europe > Slovakia (0.05)
- Europe > Czechia (0.05)
- (2 more...)
- Information Technology (0.35)
- Leisure & Entertainment > Games (0.31)
OpenAI Wants ChatGPT to Be Your Future Operating System
At OpenAI's Developer Day, CEO Sam Altman showed off apps that run entirely inside the chat window--a new effort to turn ChatGPT into a platform. On Monday, OpenAI unveiled a new way to embed third-party apps directly into ChatGPT. At the company's annual developer conference in San Francisco, CEO Sam Altman said the move would "enable a new generation of apps that are adaptive, interactive, and personalized, that you can chat with." Starting today, some developers will be able to use a preview version of a new Apps software development kit (SDK) to build apps within ChatGPT using open standards. The ability to distribute these apps is currently limited to a handful of big partners.
- North America > United States > California > San Francisco County > San Francisco (0.36)
- Europe > Slovakia (0.05)
- Europe > Czechia (0.05)
- Asia > China (0.05)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (1.00)
Character.AI Gave Up on AGI. Now It's Selling Stories
After school, Karandeep Anand often finds his 6-year-old daughter deep in conversation with an AI chatbot as she eats snacks at their kitchen counter. She's too young to type--let alone have her own account on Character.AI--but that hasn't stopped her from nabbing his phone to have voice conversations with a Sherlock Holmes bot, which she uses to build her own mystery stories. Character.AI is an AI companion startup (though Anand likes to say it's an AI role-play startup, which we'll get into later). He took over as the CEO in June in the midst of a potentially devastating lawsuit for its parent company and looming questions about child safety. When I ask if he's concerned about his daughter connecting with an AI chatbot rather than a real human, he's quick to say no.
Meta's AI Recruiting Campaign Finds a New Target
Mark Zuckerberg is on a warpath to recruit top talent in the AI field for his newly formed Meta Superintelligence Labs. After trying to gut OpenAI (and successfully poaching several top researchers), he appears to have set his sights on his next target. More than a dozen people at Mira Murati's 50-person startup, Thinking Machines Lab, have been approached or received offers from the tech giant. One of those offers was more than 1 billion over a multi-year span, a source with knowledge of the negotiations tells WIRED. The rest were between 200 million and 500 million over a four-year span, multiple sources confirm.
Evaluating Large Language Models on the Frame and Symbol Grounding Problems: A Zero-shot Benchmark
Recent advancements in large language models (LLMs) have revitalized philosophical debates surrounding artificial intelligence. Two of the most fundamental challenges - namely, the Frame Problem and the Symbol Grounding Problem - have historically been viewed as unsolvable within traditional symbolic AI systems. This study investigates whether modern LLMs possess the cognitive capacities required to address these problems. To do so, I designed two benchmark tasks reflecting the philosophical core of each problem, administered them under zero-shot conditions to 13 prominent LLMs (both closed and open-source), and assessed the quality of the models' outputs across five trials each. Responses were scored along multiple criteria, including contextual reasoning, semantic coherence, and information filtering. The results demonstrate that while open-source models showed variability in performance due to differences in model size, quantization, and instruction tuning, several closed models consistently achieved high scores. These findings suggest that select modern LLMs may be acquiring capacities sufficient to produce meaningful and stable responses to these long-standing theoretical challenges.
Solve-Detect-Verify: Inference-Time Scaling with Flexible Generative Verifier
Zhong, Jianyuan, Li, Zeju, Xu, Zhijian, Wen, Xiangyu, Li, Kezhi, Xu, Qiang
Large Language Model (LLM) reasoning for complex tasks inherently involves a trade-off between solution accuracy and computational efficiency. The subsequent step of verification, while intended to improve performance, further complicates this landscape by introducing its own challenging trade-off: sophisticated Generative Reward Models (GenRMs) can be computationally prohibitive if naively integrated with LLMs at test-time, while simpler, faster methods may lack reliability. To overcome these challenges, we introduce FlexiVe, a novel generative verifier that flexibly balances computational resources between rapid, reliable fast thinking and meticulous slow thinking using a Flexible Allocation of Verification Budget strategy. We further propose the Solve-Detect-Verify pipeline, an efficient inference-time scaling framework that intelligently integrates FlexiVe, proactively identifying solution completion points to trigger targeted verification and provide focused solver feedback. Experiments show FlexiVe achieves superior accuracy in pinpointing errors within reasoning traces on ProcessBench. Furthermore, on challenging mathematical reasoning benchmarks (AIME 2024, AIME 2025, and CNMO), our full approach outperforms baselines like self-consistency in reasoning accuracy and inference efficiency. Our system offers a scalable and effective solution to enhance LLM reasoning at test time.
- North America > United States (0.04)
- Europe > Middle East > Malta > Eastern Region > Northern Harbour District > St. Julian's (0.04)
- Asia > Thailand > Bangkok > Bangkok (0.04)
- Asia > China > Hong Kong (0.04)